Signalling In Written Text: A Corpus-Based Approach

نویسنده

  • Marie-Paule Pery-Woodley
چکیده

The concern of this paper is the signalling of segments and relations in written texts. It explores the role of visual formatting and its relation to lexical and other markers. Through a corpus-based study of a specific "text object" definitions in instructional texts, it brings together two models of text structure: RST and the model of text architecture. Unlike RST, this latter model gives a central place to signalling, establishing a theoretically-motivated relation of functional equivalence between markers based on typography or layout and lexico-syntactic markers. Definitions in the corpus are characterised on the basis of configurations of markers, and their occurrences charted in the global structure of the text. The distribution of definition patterns highlights the dynamic nature of text: markers of a specific text object vary systematically according to where it occurs in the structural hierarchy of the text. The study establishes a relation between text objects and RST segments, thus opening the range of discourse markers to include visual formatting, and providing RST segments with a textual status. I n t r o d u c t i o n Discourse relations are heterogeneous; text organisation seems to work on several distinct levels (Cf. Moore and Pollack 1992). This complexity has been the focus of much research recently, with a number of authors appealing to Halliday's tripartite distinction of linguistic metafunctions ideational, interpersonal and textual in order to articulate different perspectives on discourse organisation, or different levels of description (Maier and Ho W 1993, Bateman and Rondhuis 1997). These authors explored ways in which the metafunctions could provide an organising principle for the classification of discourse relations and markers (otherwise classified as semantic vs. pragmatic, subject-matter vs. presentational, etc.). The textual metafunction, described by Halliday and Hasan (1976) as "the text-forming component in the linguistic system", comprising "the resources that language has for creating text" (ibid: 26) has tended to receive the least developed treatment. The focus of this paper is the textual metafunction, and its aim is to contribute to an understanding of the "resources" that are exploited to create textual meaning, more specifically markers of relations and segment boundaries. My approach belongs in corpus linguistics, and is therctore guided by an awareness of the diversity of language productions. A first factor of variation is domain: a number of studieg are concerned with the linguistic characterisation of domain sublanguages (Grishman and Kittredge 1986; Sager, Friedman et al. 1987) A second factor is genre, which subsumes social /'unction, discourse purpose, channel. This study focusses on written texts with a specific discourse function i n s t r u c t i o n a l within a particular domain: software manuals. The specificity of written texts and its relevance to an understanding of discourse organisation must be stressed: firstly, in most cases, writing implies that the writer 1 and the intended audience do not share the context of communication. This has two major consequences for the organisation of written text: a) a written text is generally a monologue, where topics are introduced. continued or dropped not through negociation between discourse participants but on the sole basis of the writer's representations and intentions; b) there is a requirement for explicitness in the signalling of the various levels of meaning. Secondly, a written text is a visual object, and its visual properties are directly involved and exploited by readers in the construction of meaning. The choice of instructional texts derives from a hypothesis linked to the explicitness requirement: the social function of these texts is such that their writers are likely to try and leave as little interpretative leeway as possible. They therefore constitute a good starting point for a study of organisational signals. Discourse theorists are generally agreed on a recursive structuring involving text segments and discourse relations. Many questions remain open, however, over the signalling of relations and the nature and status of the segments. In RST. the authors stress the absence of specific signalling of rhetorical relations. As for the segments concerned, the minimal units are defined as "typically clauses", but Mann and Thompson specify that the relations in fact hold between the 1 I use the word writer for convenience, even though the production of a text may involve several agents.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

پیکره اعلام: یک پیکره استاندارد واحدهای اسمی برای زبان فارسی

Named entity recognition (NER) is a natural language processing (NLP) problem that is mainly used for text summarization, data mining, data retrieval, question and answering, machine translation, and document classification systems. A NER system is tasked with determining the border of each named entity, recognizing its type and classifying it into predefined categories. The categories of named...

متن کامل

روشی جدید جهت استخراج موجودیت‌های اسمی در عربی کلاسیک

In Natural Language Processing (NLP) studies, developing resources and tools makes a contribution to extension and effectiveness of researches in each language. In recent years, Arabic Named Entity Recognition (ANER) has been considered by NLP researchers due to a significant impact on improving other NLP tasks such as Machine translation, Information retrieval, question answering, query result...

متن کامل

A Corpus-based Analysis of Epistemic Stance Adverbs in Essays Written by Native English Speakers and Iranian EFL Learners

Academic essays entail taking a stance on the truth value of propositions. Epistemic adverbs deal with the speaker's assessment of the truth value of propositions. Employing a corpus-based approach with descriptive statistics and qualitative description, this study explored the use of epistemic stance adverbs in academic essays written by native English speakers and Iranian EFL learners. Follow...

متن کامل

Lexical Bundles in English Abstracts of Research Articles Written by Iranian Scholars: Examples from Humanities

This paper investigates a special type of recurrent expressions, lexical bundles, defined as a sequence of three or more words that co-occur frequently in a particular register (Biber et al., 1999). Considering the importance of this group of multi-word sequences in academic prose, this study explores the forms and syntactic structures of three- and four-word bundles in English abstracts writte...

متن کامل

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998